Goto

Collaborating Authors

 eye tracking


Supplementary Material A Access to and Benchmark

Neural Information Processing Systems

Figure 10: Illustration of the frame-based pupil segmentation: (a) the input eye image I; (b) the generate binary mask M; and (c) the detected pupil boundary Q and the pupil center c. 16 C More Details in Experiment C.1 Evaluation metrics The detailed description of the four metrics adopted for the dataset evalution are as follows:


EV-Eye: Rethinking High-frequency Eye Tracking through the Lenses of Event Cameras

Neural Information Processing Systems

In this paper, we present EV-Eye, a first-of-its-kind large-scale multimodal eye tracking dataset aimed at inspiring research on high-frequency eye/gaze tracking. EV -Eye utilizes the emerging bio-inspired event camera to capture independent pixel-level intensity changes induced by eye movements, achieving sub-microsecond latency.



EV-Eye: Rethinking High-frequency Eye Tracking through the Lenses of Event Cameras

Neural Information Processing Systems

In this paper, we present EV-Eye, a first-of-its-kind large-scale multimodal eye tracking dataset aimed at inspiring research on high-frequency eye/gaze tracking. EV -Eye utilizes the emerging bio-inspired event camera to capture independent pixel-level intensity changes induced by eye movements, achieving sub-microsecond latency.


MVRS: The Multimodal Virtual Reality Stimuli-based Emotion Recognition Dataset

Mousavi, Seyed Muhammad Hossein, Ilanloo, Atiye

arXiv.org Artificial Intelligence

Automatic emotion recognition has become increasingly important with the rise of AI, especially in fields like healthcare, education, and automotive systems. However, there is a lack of multimodal datasets, particularly involving body motion and physiological signals, which limits progress in the field. To address this, the MVRS dataset is introduced, featuring synchronized recordings from 13 participants aged 12 to 60 exposed to VR based emotional stimuli (relaxation, fear, stress, sadness, joy). Data were collected using eye tracking (via webcam in a VR headset), body motion (Kinect v2), and EMG and GSR signals (Arduino UNO), all timestamp aligned. Participants followed a unified protocol with consent and questionnaires. Features from each modality were extracted, fused using early and late fusion techniques, and evaluated with classifiers to confirm the datasets quality and emotion separability, making MVRS a valuable contribution to multimodal affective computing.


WEBEYETRACK: Scalable Eye-Tracking for the Browser via On-Device Few-Shot Personalization

Davalos, Eduardo, Zhang, Yike, Srivastava, Namrata, Thatigotla, Yashvitha, Salas, Jorge A., McFadden, Sara, Cho, Sun-Joo, Goodwin, Amanda, TS, Ashwin, Biswas, Gautam

arXiv.org Artificial Intelligence

With advancements in AI, new gaze estimation methods are exceeding state-of-the-art (SOTA) benchmarks, but their real-world application reveals a gap with commercial eye-tracking solutions. Factors like model size, inference time, and privacy often go unaddressed. Meanwhile, webcam-based eye-tracking methods lack sufficient accuracy, in particular due to head movement. To tackle these issues, we introduce We bEyeTrack, a framework that integrates lightweight SOTA gaze estimation models directly in the browser. It incorporates model-based head pose estimation and on-device few-shot learning with as few as nine calibration samples (k < 9). WebEyeTrack adapts to new users, achieving SOTA performance with an error margin of 2.32 cm on GazeCapture and real-time inference speeds of 2.4 milliseconds on an iPhone 14. Our open-source code is available at https://github.com/RedForestAi/WebEyeTrack.


Detecting Reading-Induced Confusion Using EEG and Eye Tracking

Zhuang, Haojun, Baradari, Dünya, Kosmyna, Nataliya, Balyan, Arnav, Albrecht, Constanze, Chen, Stephanie, Maes, Pattie

arXiv.org Artificial Intelligence

Humans regularly navigate an overwhelming amount of information via text media, whether reading articles, browsing social media, or interacting with chatbots. Confusion naturally arises when new information conflicts with or exceeds a reader's comprehension or prior knowledge, posing a challenge for learning. In this study, we present a multimodal investigation of reading-induced confusion using EEG and eye tracking. We collected neural and gaze data from 11 adult participants as they read short paragraphs sampled from diverse, real-world sources. By isolating the N400 event-related potential (ERP), a well-established neural marker of semantic incongruence, and integrating behavioral markers from eye tracking, we provide a detailed analysis of the neural and behavioral correlates of confusion during naturalistic reading. Using machine learning, we show that multimodal (EEG + eye tracking) models improve classification accuracy by 4-22% over unimodal baselines, reaching an average weighted participant accuracy of 77.3% and a best accuracy of 89.6%. Our results highlight the dominance of the brain's temporal regions in these neural signatures of confusion, suggesting avenues for wearable, low-electrode brain-computer interfaces (BCI) for real-time monitoring. These findings lay the foundation for developing adaptive systems that dynamically detect and respond to user confusion, with potential applications in personalized learning, human-computer interaction, and accessibility.


Real-Time Sleepiness Detection for Driver State Monitoring System

Ghimire, Deepak, Jeong, Sunghwan, Yoon, Sunhong, Park, Sanghyun, Choi, Juhwan

arXiv.org Artificial Intelligence

Driver face monitoring system can detect driver fatigue, which is an important factor in a large number of accidents, using computer vision techniques. In this paper we present a real-time technique for driver eye state detection. At first face is detected and the eyes are searched inside face region for tracking. A normalized cross correlation based online dynamic template matching technique with combination of Kalman filter tracking is proposed to track the detected eye positions in the subsequent image frames. Support vector machine with histogram of orientation gradient features is used for classification of state of the eyes as open or closed. If the eye(s) state is detected as closed for a specified amount of time the driver is considered to be sleeping and an alarm will be generated.


AI-Based Screening for Depression and Social Anxiety Through Eye Tracking: An Exploratory Study

Chlasta, Karol, Wisiecka, Katarzyna, Krejtz, Krzysztof, Krejtz, Izabela

arXiv.org Artificial Intelligence

Well-being is a dynamic construct that evolves over time and fluctuates within individuals, presenting challenges for accurate quantification. Reduced well-being is often linked to depression or anxiety disorders, which are characterised by biases in visual attention towards specific stimuli, such as human faces. This paper introduces a novel approach to AI-assisted screening of affective disorders by analysing visual attention scan paths using convolutional neural networks (CNNs). Data were collected from two studies examining (1) attentional tendencies in individuals diagnosed with major depression and (2) social anxiety. These data were processed using residual CNNs through images generated from eye-gaze patterns. Experimental results, obtained with ResNet architectures, demonstrated an average accuracy of 48% for a three-class system and 62% for a two-class system. Based on these exploratory findings, we propose that this method could be employed in rapid, ecological, and effective mental health screening systems to assess well-being through eye-tracking.


FACET: Fast and Accurate Event-Based Eye Tracking Using Ellipse Modeling for Extended Reality

Ding, Junyuan, Wang, Ziteng, Gao, Chang, Liu, Min, Chen, Qinyu

arXiv.org Artificial Intelligence

Eye tracking is a key technology for gaze-based interactions in Extended Reality (XR), but traditional frame-based systems struggle to meet XR's demands for high accuracy, low latency, and power efficiency. Event cameras offer a promising alternative due to their high temporal resolution and low power consumption. In this paper, we present FACET (Fast and Accurate Event-based Eye Tracking), an end-to-end neural network that directly outputs pupil ellipse parameters from event data, optimized for real-time XR applications. The ellipse output can be directly used in subsequent ellipse-based pupil trackers. We enhance the EV-Eye dataset by expanding annotated data and converting original mask labels to ellipse-based annotations to train the model. Besides, a novel trigonometric loss is adopted to address angle discontinuities and a fast causal event volume event representation method is put forward. On the enhanced EV-Eye test set, FACET achieves an average pupil center error of 0.20 pixels and an inference time of 0.53 ms, reducing pixel error and inference time by 1.6$\times$ and 1.8$\times$ compared to the prior art, EV-Eye, with 4.4$\times$ and 11.7$\times$ less parameters and arithmetic operations. The code is available at https://github.com/DeanJY/FACET.